A graph based algorithm for generating EST consensus sequences

نویسندگان

  • Ketil Malde
  • Eivind Coward
  • Inge Jonassen
چکیده

MOTIVATION EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. RESULTS In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed. AVAILABILITY The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/ CONTACT [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Clustering and Assembly of Large EST Collections

The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recur...

متن کامل

A Review on Consensus Algorithms in Blockchain

Block chain technology is a decentralized data storage structure based on a chain of data blocks that are related to each other. Block chain saves new blocks in the ledger without trusting intermediaries through a competitive or voting mechanism. Due to the chain structure or the graph between each block with its previous blocks, it is impossible to change blocking data. Block chain architectur...

متن کامل

ESTminer: a suite of programs for gene and allele identification

UNLABELLED ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are relate...

متن کامل

Parallelization of MIRA Whole Genome and EST Sequence Assembler

The genome assembly problem is to generate the original DNA sequence of the organism from a large set of short overlapping fragments. MIRA is an open source assembler based on the Overlap Layout Consensus (OLC) graph model which addresses the assembly problem and is widely used by biologists [1,2]. Like other assemblers MIRA takes a long time to compute the assembly for large number of sequence...

متن کامل

SIMULATED ANNEALING ALGORITHM FOR SELECTING SUBOPTIMAL CYCLE BASIS OF A GRAPH

The cycle basis of a graph arises in a wide range of engineering problems and has a variety of applications. Minimal and optimal cycle bases reduce the time and memory required for most of such applications. One of the important applications of cycle basis in civil engineering is its use in the force method to frame analysis to generate sparse flexibility matrices, which is needed for optimal a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 8  شماره 

صفحات  -

تاریخ انتشار 2005